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We formulate a general model for the growth of scale-free networks under filtering information 
conditions — that is, when the nodes can process information about only a subset of the existing 
nodes in the network. We find that the distribution of the number of incoming links to a node 
follows a universal scaling form, i.e., that it decays as a power law with an exponential truncation 
controlled not only by the system size but also by a feature not previously considered, the subset of 
the network "accessible" to the node. We test our model with empirical data for the World Wide 
Web and find agreement. 
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There is a great deal of current interest in understand- 
ing the structure and growth mechanisms of global net- 
works 01, 1 such as the world- wide- web (WWW) [|, |] 
and the Internet Q . Network structure is critical in many 
contexts such as Internet attacks spread of e-mail 
virus or dynamics of human epidemics |^ . In all these 
problems, the nodes with the largest number of links play 
an important role on the dynamics of the system. It is 
therefore important to know the global structure of the 
network as well as its precise distribution of number of 
links. 

Recent empirical studies report that both the Internet 
and the WWW have scale-free properties, that is, the 
number of incoming links and the number of outgoing 
links at a given node have distributions that decay with 
power law tails [§, |[ D- It has been proposed |Q| that 
the scale-free structure of the Internet and the WWW 
may be explained by a mechanism referred to as "prefer- 
ential attachment" ||l0|] in which new nodes link to exist- 
ing nodes with a probability proportional to the number 
of existing links to these nodes. Here we focus on the 
stochastic character of the preferential attachment mech- 
anism, which we understand in the following way: New 
nodes want to connect to the existing nodes with the 
largest number of links — i.e., with the largest degree — 
because of the advantages offered by being linked to a 
well-connected node. For a large network it is not plausi- 
ble that a new node will know the degrees of all existing 
nodes, so a new node must make a decision on which node 
to connect with based on what information it has about 
the state of the network. The preferential attachment 
mechanism then comes into play as nodes with larger 
degree are more likely to become known. 

This picture has one underlying and unstated assump- 
tion, that the new nodes will process (i.e., gather, store, 
retrieve and analyze) information concerning the state of 
the entire network. For very large networks, such as the 
WWW or the scientific literature, this would correspond 



to the unrealistic situation in which new nodes can pro- 
cess an extremely large amount of information — i.e., have 
unlimited information-processing capabilities. Indeed, it 
is likely that nodes have limited information-processing 
capabilities and so must filter incoming information ac- 
cording to their particular "interests" . Thus, new nodes 
of a large growing network will only process information 
concerning a subset of existing nodes, since there is a cost 
associated with processing information. The new nodes 
will then make decisions on with whom to link, based on 
filtered information. From the standpoint proposed here, 
most models studied in the literature work under the un- 
realistic assumption of unfiltered information — i.e., a new 
node processes information about all the existing nodes 
in the network. 

Here we consider for the first time the effect on 
network growth of filtering information due to limited 
information-processing capabilities. First, we calcu- 
late the in-degree distributions of web-pages using two 
databases. The first database, which comprises « 2 x 10* 
pages , surveys a very significant fraction of the entire 
WWW, while the second, which comprises w 3 x 10^ 
pages, lists the University of Notre Dame domain |^ — 
i.e. the set of URLs containing the string "nd.edu". For 
the first database, we calculate the cumulative in-degree 
distributions P{k) = X]fc'>fe-P(^') where p(fc) is the prob- 
ability distribution. We confirm that the in-degree dis- 
tribution decays as a power law |^ of the form 

P{k) - fc-'^'" (1) 

with an exponent 7i„ = 1.25 ± 0.05 (Fig. [T]). Further we 
find an exponential truncation of the scale-free behavior 
for fc > fcx ~ 2 X 10^, in contrast with the plateau re- 
ported in other studies |§, |ll|. For the second database, 
we also find a power- law regime with the same exponent, 
but the exponential truncation appears to be absent, sug- 
gesting that the truncation is not due to the finite size of 
the databases. 
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FIG. 1: Distribution of number of incoming links for 
the WWW. Cumulative in-degree distribution from two 
databases, the entire Web and the University of Notre 
Dame domain We also plot a power law function with 
exponent 7i„ — 1.25 (dashed line) and a Yule function [|l^ of 
the form fc""^'" exp(— afc) (solid line). A cut-off degree fcx — 
200,000 is visible in the data. 



To explain these empirical results, we hypothesize that 
the authors of new web-pages filter some of the informa- 
tion regarding existing web-pages, that is, the new nodes 
make linking decisions under information-filtering condi- 
tions. To investigate this process, we consider network 
growth models in which new nodes process information 
from only a fraction of existing nodes which one may 
view as matching the "interests" of the new nodes. If the 
fraction / of "interesting" nodes in the network is much 
less than one, then the attachment of new links is a ran- 
dom process, so the generated network will be a random 
graph with an exponentially-decaying in-degree distribu- 
tion. In contrast, if / « 1, then preferential attachment 
is recovered and the in-degree distribution is scale-free. 

We first define the network growth rule: At time t = 0, 
one creates Uq nodes with Uo — l links each. At each time 
step, one adds to the network a new node with rio — 1 
outgoing links. These no links can connect to a randomly 
selected subset C containing n{t) = {t + no)f nodes. The 
links to the nodes in the subset are selected according 
to the preferential attachment rule, i.e., the probability 
that node i belonging to C is selected is proportional to 
the number of incoming links k(i) to it 



p(i,t) 



kU) 



(2) 



In Fig. §(a), we show our numerical results for the in- 
degree cumulative distributions for networks with S — 
5 X 10^ nodes and rio ~ 1, for a sequence of / values. For 
/ = 1, we reproduce the results reported for the scale- 
free model — i.e. we observe an in-degree distribution 
that decays as a power law with an exponent 7i„ « 2. 
For / < 10~^, we observe a crossover at fc = fcx from 
power-law behavior to exponential behavior. 

To further investigate the effect of changes in / on 
the cut-off degree fcx, we plot in Fig. ||(b) the in-degree 






10° 




10" 




10" 




10"^ 








10"^ 









; (f) V 



10"' 



10"' 10" 

k/(Sf)'^ 



10" 



10" 

k/n'^ 



10' 



FIG. 2: In-degree cumulative probability distributions P{k) 
under information filtering. Constant f case: (a) Results for 
S = 5 X 10^ and different values of /. (b) Results for / = 
10"'^ and different values of S. (a) and (b) show that fcx 
decreases with / and increases with S. (c) Data collapse of the 
numerical results according to Eq. ^ with 7i„ — 1.97 ± 0.05 
and 6*1 = 0.45 ± 0.04. Constant n case: (d) Results for 5* — 
5 X 10^ and different values of n showing the decrease in the 
cut-off degree fcx with decreasing n. (e) Results for n = 2, 10 
and 1,000 for different values of S showing that P{k) does 
not depend on S. (f) Data collapse according to Eq. (U|) with 
7i„ = 2.00 ± 0.03 and 02 = 0.65 ± 0.04. 



distributions for different network sizes S and a fixed 
value of /. We find that fcx increases as a power law 
with S. All of our numerical results can be expressed 
compactly by the scaling form 



P(fc,/,5)cxfc" 



•^1 ^ 



(3) 



with fcx - {Sf^K We find = 1.97 ± 0.05, Oi = 
0.45 ± 0.04 and Ti{x) ~ const, for x < 1, J^i(a;) - e""' 
for a: ^ 1 . As a test of the scaling form Eq. (|^) , we plot 
in Fig. |l|(c) the scaled cumulative distribution versus the 
scaled in-degree. The figure confirms our scaling Ansatz, 
since all data "collapse" onto a single curve, the scaling 
function J-i(x). 

We consider next a situation in which new nodes are 
not processing information from a constant fraction / of 
nodes but from a constant number n of nodes. That is, 
as the network grows, the new nodes are able to process 
information about a smaller fraction of existing nodes. 
This model may be more plausible for networks that have 
grown to a very large size, since the fraction / of all 
nodes represents a very large number. In the case of the 
scientific literature, this effect leads to the fragmentation 
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FIG. 3: Dependence of the in-degree distribution exponent 
7i,i on the out-degree distribution exponent ')out- We show 
results for models (i) without fitness {rj{i) = const.) and (ii) 
with fitness {r]{i) uniformly distributed). For the former case, 
7m increases initially approximately linearly with 7out, and 
then saturates at 7i„ ~ 2 for 7out > 2. This saturation of 7i„ 
is to be expected as 7i„ = 2 for the case of a peaked distribu- 
tion of rio. For the latter case, 7in increases approximately lin- 
early with 7otit initially, and then saturates at 7i„ ~ 1.25 for 
7oiit > 1-9. This saturation is to be expected as 7i„ = 1.255 
for the case of a peaked distribution of rio Q . 



of a scientific field as it grows ||T^ . 

For the constant n case, the fraction of known nodes at 
time t is fit) — n/ {t + no)^ implying that as the networks 
grows there are two antagonistic trends affecting fcx . The 
first is a tendency to increase due to the growing size of 
the network, and the second is a tendency to decrease due 
to the decreasing value of /. Hence, one may hypothesize 
that there will be a characteristic network size Sc above 
which fcx will no longer depend on S. 

We now test these arguments with numerical simula- 
tions. In Fig. ||(d)-(e), we show our results for growing 
networks for which new nodes process information only 
from n randomly selected existing nodes. We find, in 
agreement with our scaling arguments, that for S ^ Sc 
the in-degree distribution obeys the scaling relation 



P(fc, n, S) oc k 



'^2 T- 



(4) 



with fcx ~ n^', Irn = 2.00 ± 0.03, 6*2 = 0.65 ± 0.04, and 
where the scaling function J-2{x) has the same limiting 
behavior as !Fi{x). To test the scaling form Eq. (||), we 
plot in Fig. ||(f ) the scaled cumulative distribution versus 
the scaled in-degree. This confirms our scaling Ansatz 
since the data collapse onto a single curve, the scaling 
function J-2{x). 

Comparison of the two scaling relations Eq. (||) and 
Eq. reveals an unexpected result. By replacing Sf 
by n in (|^) one would naively expect to obtain (^) with 
Oi = 02 and Tiix) = J^2(x). Surprisingly, we find that 9i 
is significantly different from 82 and that Ti {x) is signif- 
icantly different from J-2{x). In order to understand this 
result, consider two growing networks that have reached 



size S. For the first, new nodes process information from 
a fraction / of existing nodes, while, for the second, new 
nodes process information from n ~ fS existing nodes. 
At a time t, prior to the network having reached its final 
size S', there are t + Uo < S sites, and the preferential 
attachment is acting for the first network on a number of 
nodes (t + no)f < Sf = n. The preferential attachment 
mechanism can operate effectively only when it acts on a 
number of nodes comparable to S, so the fact that for the 
first network new nodes have always processed informa- 
tion from fewer existing nodes suggests the first network 
will not develop nodes with as large a degree as the sec- 
ond network. Thus, we expect that (i) the two resulting 
networks have different in-degree distributions, and (ii) 
the in-degree distribution for / fixed has a sharper trun- 
cation and a smaller cut-off than for n fixed, which is 
indeed what we find. 

Our numerical results are in qualitative agreement 
with empirical data. However, the value of the power 
law exponent ~ f -25 found for the WWW is signif- 
icantly smaller than the value 7^ = 2 predicted by the 
model. This fact prompts the question of the effect of 
the cost of information filtering on models generating an 
in-degree distribution closer to the empirical results. To 
answer this question, we investigate two possible expla- 
nations for the observed value 1.25. 

(i) Effect of out-degree distribution on jin. The scale- 
free model is missing an important ingredient: a het- 
erogeneous distribution of number of outgoing links. In- 
deed, the out-degree distribution considered so far is re- 
stricted to a single value m = Uo — 1 , i.e. Pout{m) = 
<5m,n<,-i, while for the empirical data of the WWW it de- 
cays as a power law of the form Pout{m)'^ m~^°"* with 
lout = 1-68 ± 0.05. We show in Fig. || the computed 
value of the exponent 7^ of the in-degree distribution 
as a function of 7o„t p3| . We find that 7i„ increases 
approximately linearly with increasing values of the ex- 
ponent 7oMt until it reaches the limiting value = 2. 
For lout ~ 1-7, which is the empirically-observed value 
for the WWW, we find 7i„ « 1.8, which does not agree 
with the empirical value of 1.25, so the power-law de- 
caying out-degree distribution alone cannot explain the 
results obtained for the WWW. 

(ii) Effect of fitness on ^in- The preferential attach- 
ment mechanism is modified by a "fitness" factor 
Nodes have different fitness, and fitter nodes are more 
likely to receive incoming links than less fit nodes with 
the same value of fc. Uniformly-distributed fitness is 
known to lead to a smaller exponent 7i„ = 1.255 [ pd] , 
which is quite close to the value measured for the WWW. 
Hence, we assign to each node a fitness rj{i) |l^, reflect- 
ing the fact that for equal values of fc some nodes are 
more "attractive" than others iQ. The probability that 
a new node will link to node i is 



r]{i)k{i) 



(5) 



We consider here the case in which ry(i) is a uniformly 
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distributed random variable . Figure || shows that the 
in-degree distribution decays as a power law with values 
of 7i„ < 1.25. For "font > 1.9, the exponent approaches 
the limiting value 7i„ « 1.25. Interestingly, for ^out ~ 
1.7, the empirical value for the WWW, we find 7i„ « 1.2, 
in agreement with the empirical value jin ~ 1.25. 

Our results for the model with fitness show that in- 
formation filtering and node fitness are both necessary 
in order to approximate the empirical results. An open 
question is which type of filtering is more appropriate 
for the WWW, constant / or constant n? To answer 
this question one would need WWW data for a differ- 
ent sample size, which are not available to us at present. 
However, due to the sheer size of the WWW, it seems 
plausible that constant n would be the more appropriate 
case. 

Our key finding is that limited information-processing 
capabilities have a significant and quantifiable effect on 
the large-scale structure of growing networks. We find 
that information filtering leads to an exponential trun- 
cation of the in-degree distribution for networks grow- 
ing under conditions of preferential attachment. Surpris- 
ingly, we find simple scaling relations that predict the 
in-degree distribution in terms of (i) the information- 
processing capabilities available to the nodes, and (ii) 
the size of the network. 

We also quantify the effect of a heterogeneous out- 
degree distribution on the in-degree distribution of net- 
works growing under conditions of preferential attach- 
ment. We find that for a power law decaying out-degree 
distribution with exponents jout < 2, the exponent 7i„ 
characterizing the tail of the in-degree distribution will 



take values smaller than those predicted by theoretical 
calculations H, |^ . 

The exponential truncation we find may have dramatic 
effects on the dynamics of the system, especially for pro- 
cesses where the nodes with the largest degree have im- 
portant roles. This is the case, for example, for virus 
spreading |Q, where for networks with exponentially- 
truncated in-degree distributions there is a non-zero 
threshold for the appearance of an epidemic. In con- 
trast, scale- free networks are prone to the spreading and 
the persistence of infections no matter how small the 
spreading rate. Our finding of a mechanism leading to 
an exponential truncation even for systems where before 
none was expected Q indicates that the most connected 
nodes will have a smaller degree than predicted for scale- 
free networks leading, possibly, to different dynamics, 
e.g., for the initiation and spread of epidemics. 

In the context of network growth, the impossibility of 
knowing the degrees of all the nodes comprising the net- 
work due to the filtering process — and hence the inability 
to make the optimal, rational, choice — is not altogether 
unlike the "bounded rationality" concept of Simon [ pT[ . 
Remarkably, it appears that for the description of WWW 
growth, the preferential attachment mechanism, origi- 
nally proposed by Simon ||l^, must be modified along 
the lines of another concept also introduced by him — 
bounded rationality 
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